88 research outputs found
The recursive neural network
This paper describes a special type of dynamic neural network called the Recursive Neural Network (RNN). The RNN is a single-input single-output nonlinear dynamical system with three subnets, a nonrecursive subnet and two recursive subnets. The nonrecursive subnet feeds current and previous input samples through a multi-layer perceptron with second order input units (SOMLP) [9]. In a similar fashion the two recursive subnets feed back previous output signals through SOMLPs. The outputs of the three subnets are summed to form the overall network output. The purpose of this paper is to describe the architecture of the RNN, to derive a learning algorithm for the network based on a gradient search, and to provide some examples of its use. The work in this paper is an extension of previous work on the RNN [10]. In previous work the RNN contained only two subnets, a nonrecursive subnet and a recursive subnet. Here we have added a second recursive subnet. In addition, both of the subnets in the previous RNN had linear input units. Here all three of the subnets have second order input units. In many cases this allows the RNN to solve problems more efficiently, that is with a smaller overall network. In addition, the use of the RNN for inverse modeling and control was never fully developed in the past. Here, for the first time, we derive the complete learning algorithm for the case where the RNN is used in the general model following configuration. This configuration includes the following as special cases: system modeling, nonlinear filtering, inverse modeling, nonlinear prediction and control
Using State-of-the-Art Speech Models to Evaluate Oral Reading Fluency in Ghana
This paper reports on a set of three recent experiments utilizing large-scale
speech models to evaluate the oral reading fluency (ORF) of students in Ghana.
While ORF is a well-established measure of foundational literacy, assessing it
typically requires one-on-one sessions between a student and a trained
evaluator, a process that is time-consuming and costly. Automating the
evaluation of ORF could support better literacy instruction, particularly in
education contexts where formative assessment is uncommon due to large class
sizes and limited resources. To our knowledge, this research is among the first
to examine the use of the most recent versions of large-scale speech models
(Whisper V2 wav2vec2.0) for ORF assessment in the Global South.
We find that Whisper V2 produces transcriptions of Ghanaian students reading
aloud with a Word Error Rate of 13.5. This is close to the model's average WER
on adult speech (12.8) and would have been considered state-of-the-art for
children's speech transcription only a few years ago. We also find that when
these transcriptions are used to produce fully automated ORF scores, they
closely align with scores generated by expert human graders, with a correlation
coefficient of 0.96. Importantly, these results were achieved on a
representative dataset (i.e., students with regional accents, recordings taken
in actual classrooms), using a free and publicly available speech model out of
the box (i.e., no fine-tuning). This suggests that using large-scale speech
models to assess ORF may be feasible to implement and scale in lower-resource,
linguistically diverse educational contexts
NASA Near Earth Network (NEN), Deep Space Network (DSN) and Space Network (SN) Support of CubeSat Communications
There has been a historical trend to increase capability and drive down the Size, Weight and Power (SWAP) of satellites and that trend continues today. Small satellites, including systems conforming to the CubeSat specification, because of their low launch and development costs, are enabling new concepts and capabilities for science investigations across multiple fields of interest to NASA. NASA scientists and engineers across many of NASAs Mission Directorates and Centers are developing exciting CubeSat concepts and welcome potential partnerships for CubeSat endeavors. From a communications and tracking point of view, small satellites including CubeSats are a challenge to coordinate because of existing small spacecraft constraints, such as limited SWAP and attitude control, low power, and the potential for high numbers of operational spacecraft. The NASA Space Communications and Navigation (SCaN) Programs Near Earth Network (NEN), Deep Space Network (DSN) and the Space Network (SN) are customer driven organizations that provide comprehensive communications services for space assets including data transport between a missions orbiting satellite and its Mission Operations Center (MOC). The NASA NEN consists of multiple ground antennas. The SN consists of a constellation of geosynchronous (Earth orbiting) relay satellites, named the Tracking and Data Relay Satellite System (TDRSS). The DSN currently makes available 13 antennas at its three tracking stations located around the world for interplanetary communication. The presentation will analyze how well these space communication networks are positioned to support the emerging small satellite and CubeSat market. Recognizing the potential support, the presentation will review the basic capabilities of the NEN, DSN and SN in the context of small satellites and will present information about NEN, DSN and SN-compatible flight radios and antenna development activities at the Goddard Space Flight Center (GSFC) and across industry. The presentation will review concepts on how the SN multiple access capability could help locate CubeSats and provide a low-latency early warning system. The presentation will also present how the DSN is evolving to maximize use of its assets for interplanetary CubeSats. The critical spectrum-related topics of available and appropriate frequency bands, licensing, and coordination will be reviewed. Other key considerations, such as standardization of radio frequency interfaces and flight and ground communications hardware systems, will be addressed as such standardization may reduce the amount of time and cost required to obtain frequency authorization and perform compatibility and end-to-end testing. Examples of standardization that exist today are the NASA NEN, DSN and SN systems which have published users guides and defined frequency bands for high data rate communication, as well as conformance to CCSDS standards. The workshop session will also seek input from the workshop participants to better understand the needs of small satellite systems and to identify key development activities and operational approaches necessary to enhance communication and navigation support using NASA's NEN, DSN and SN
Fixed Points in Two--Neuron Discrete Time Recurrent Networks: Stability and Bifurcation Considerations
The position, number and stability types of fixed points of a two--neuron
recurrent network with nonzero weights are investigated. Using simple
geometrical arguments in the space of derivatives of the sigmoid transfer
function with respect to the weighted sum of neuron inputs, we partition
the network state space into several regions corresponding to stability
types of the fixed points. If the neurons have the same mutual
interaction pattern, i.e. they either mutually inhibit or mutually excite
themselves, a lower bound on the rate of convergence of the attractive
fixed points towards the saturation values, as the absolute values of
weights on the self--loops grow, is given. The role of weights in location
of fixed points is explored through an intuitively appealing
characterization of neurons according to their inhibition/excitation
performance in the network. In particular, each neuron can be of one of
the four types: greedy, enthusiastic, altruistic or depressed. Both with
and without the external inhibition/excitation sources, we investigate the
position and number of fixed points according to character of the neurons.
When both neurons self-excite (or self-inhibit) themselves and have the
same mutual interaction pattern, the mechanism of creation of a new
attractive fixed point is shown to be that of saddle node bifurcation.
(Also cross-referenced as UMIACS-TR-95-51
Learning a Class of Large Finite State Machines with a Recurrent Neural Network
One of the issues in any learning model is how it scales with problem
size. Neural networks have not been immune to scaling issues. We show that a
dynamically-driven discrete-time recurrent network (DRNN) can learn rather
large grammatical inference problems when the strings of a finite memory
machine (FMM) are encoded as temporal sequences. FMMs are a subclass of finite
state machines which have a finite memory or a finite order of inputs and
outputs. The DRNN that learns the FMM is a neural network that maps directly
from the sequential machine implementation of the FMM. It has feedback only
from the output and not from any hidden units; an example is the recurrent
network of Narendra and Parthasarathy. (FMMs that have zero order in the
feedback of outputs are called definite memory machines and are analogous to
Time-delay or Finite Impulse Response neural networks.) Due to their
topology these DRNNs are as least as powerful as any sequential machine
implementation of a FMM and should be capable of representing any FMM. We
choose to learn ``particular FMMs.\' Specifically, these FMMs have a large
number of states (simulations are for and state FMMs) but have
minimal order, relatively small depth and little logic when the FMM is
implemented as a sequential machine. Simulations for the number of training
examples versus generalization performance and FMM extraction size show that
the number of training samples necessary for perfect generalization
is less than that necessary to completely characterize the FMM to be learned.
This is in a sense a best case learning problem since any arbitrarily chosen
FMM with a minimal number of states would have much more order and string
depth and most likely require more logic in its sequential machine
implementation.
(Also cross-referenced as UMIACS-TR-94-94
Learning Long-Term Dependencies is Not as Difficult With NARX Recurrent Neural Networks
It has recently been shown that gradient descent learning algorithms for
recurrent neural networks can perform poorly on tasks that involve long-
term dependencies, i.e. those problems for which the desired output
depends on inputs presented at times far in the past.
In this paper we explore the long-term dependencies problem for a class of
architectures called NARX recurrent neural networks, which have power
ful representational capabilities. We have previously reported that gradient
descent learning is more effective in NARX networks than in recurrent
neural network architectures that have ``hidden states'' on problems includ
ing grammatical inference and nonlinear system identification. Typically,
the network converges much faster and generalizes better than other net
works. The results in this paper are an attempt to explain this phenomenon.
We present some experimental results which show that NARX networks
can often retain information for two to three times as long as conventional
recurrent neural networks. We show that although NARX networks do not
circumvent the problem of long-term dependencies, they can greatly
improve performance on long-term dependency problems.
We also describe in detail some of the assumption regarding what it means
to latch information robustly and suggest possible ways to loosen these
assumptions.
(Also cross-referenced as UMIACS-TR-95-78
Product Unit Learning
Product units provide a method of automatically learning the
higher-order input combinations required for the efficient synthesis of
Boolean logic functions by neural networks. Product units also have a
higher information capacity than sigmoidal networks. However, this
activation function has not received much attention in the literature. A
possible reason for this is that one encounters some problems when
using standard backpropagation to train networks containing these
units. This report examines these problems, and evaluates the
performance of three training algorithms on networks of this
type. Empirical results indicate that the error surface of networks
containing product units have more local minima than corresponding
networks with summation units. For this reason, a combination of local
and global training algorithms were found to provide the most reliable
convergence.
We then investigate how `hints' can be added to the training algorithm.
By extracting a common frequency from the input weights,
and training this frequency separately, we show that convergence can
be accelerated.
A constructive algorithm is then introduced which adds product units
to a network as required by the problem. Simulations show that
for the same problems this method creates a network with significantly
less neurons than those constructed by the tiling and upstart algorithms.
In order to compare their performance with other transfer functions,
product units were implemented as candidate units in the Cascade
Correlation (CC) \cite{Fahlman90} system. Using these candidate units
resulted in smaller networks which trained faster than when the any of
the standard (three sigmoidal types and one Gaussian) transfer
functions were used. This superiority was confirmed when a pool of
candidate units of four different nonlinear activation functions were
used, which have to compete for addition to the network. Extensive
simulations showed that for the problem of implementing random Boolean
logic functions, product units are always chosen above any of
the other transfer functions.
(Also cross-referenced as UMIACS-TR-95-80
Performance of On-Line Learning Methods in Predicting Multiprocessor Memory Access Patterns
Shared memory multiprocessors require reconfigurable interconnection
networks (INs) for scalability. These INs are reconfigured by an IN
control unit. However, these INs are often plagued by undesirable
reconfiguration time that is primarily due to control latency, the
amount of time delay that the control unit takes to decide on a
desired new IN configuration. To reduce control latency, a trainable
prediction unit (PU) was devised and added to the IN controller. The
PU's job is to anticipate and reduce control configuration time, the
major component of the control latency. Three different on-line
prediction techniques were tested to learn and predict repetitive
memory access patterns for three typical parallel processing applications,
the 2-D relaxation algorithm, matrix multiply and Fast Fourier Transform.
The predictions were then used by a routing control algorithm to reduce
control latency by configuring the IN to provide needed memory access
paths before they were requested. Three prediction techniques were used
and tested: 1). a Markov predictor, 2). a linear predictor and 3). a
time delay neural network (TDNN) predictor. As expected, different
predictors performed best on different applications, however, the TDNN
produced the best overall results.
(Also cross-referenced as UMIACS-TR-96-59
- …